AITopics | forget gate

Mamba is an effective state space model with linear computation complexity. It has recently shown impressive efficiency in dealing with high-resolution inputs across various vision tasks. In this paper, we reveal that the powerful Mamba model shares surprising similarities with linear attention Transformer, which typically underperform conventional Transformer in practice. By exploring the similarities and disparities between the effective Mamba and subpar linear attention Transformer, we provide comprehensive analyses to demystify the key factors behind Mamba's success. Specifically, we reformulate the selective state space model and linear attention within a unified formulation, rephrasing Mamba as a variant of linear attention Transformer with six major distinctions: input gate, forget gate, shortcut, no attention normalization, single-head, and modified block design.

artificial intelligence, linear attention transformer, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Vision (0.42)

Add feedback

Hierarchically Gated Recurrent Neural Network for Sequence Modeling

Neural Information Processing SystemsDec-25-2025, 20:40:16 GMT

Transformers have surpassed RNNs in popularity due to their superior abilities in parallel training and long-term dependency modeling.Recently, there has been a renewed interest in using linear RNNs for efficient sequence modeling.These linear RNNs often employ gating mechanisms in the output of the linear recurrence layer while ignoring the significance of using forget gates within the recurrence. In this paper, we propose a gated linear RNN model dubbed Hierarchically Gated Recurrent Neural Network (HGRN), which includes forget gates that are lower bounded by a learnable value.

hierarchically gated recurrent neural network, name change, sequence modeling, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Demystify Mamba in Vision: A Linear Attention Perspective Dongchen Han 1 Ziyi Wang

Neural Information Processing SystemsNov-20-2025, 05:57:34 GMT

Mamba is an effective state space model with linear computation complexity.

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country:

North America > Mexico > Gulf of Mexico (0.04)
Asia > China (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

H-Infinity Filter Enhanced CNN-LSTM for Arrhythmia Detection from Heart Sound Recordings

Kumar, Rohith Shinoj, Dinda, Rushdeep, Tyagi, Aditya, B., Annappa, R, Naveen Kumar M.

arXiv.org Artificial IntelligenceNov-5-2025

Early detection of heart arrhythmia can prevent severe future complications in cardiac patients. While manual diagnosis still remains the clinical standard, it relies heavily on visual interpretation and is inherently subjective. In recent years, deep learning has emerged as a powerful tool to automate arrhythmia detection, offering improved accuracy, consistency, and efficiency. Several variants of convolutional and recurrent neural network architectures have been widely explored to capture spatial and temporal patterns in physiological signals. However, despite these advancements, current models often struggle to generalize well in real-world scenarios, especially when dealing with small or noisy datasets, which are common challenges in biomedical applications. In this paper, a novel CNN-H-Infinity-LSTM architecture is proposed to identify arrhythmic heart signals from heart sound recordings. This architecture introduces trainable parameters inspired by the H-Infinity filter from control theory, enhancing robustness and generalization. Extensive experimentation on the PhysioNet CinC Challenge 2016 dataset, a public benchmark of heart audio recordings, demonstrates that the proposed model achieves stable convergence and outperforms existing benchmarks, with a test accuracy of 99.42% and an F1 score of 98.85%.

artificial intelligence, classification, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2511.02379

Country:

Asia > India > Karnataka (0.04)
North America > United States (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

e618724ac897c6cf3fbfb273f8695d67-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 19:49:50 GMT

artificial intelligence, machine learning, natural language, (15 more...)

Neural Information Processing Systems

Country:

North America > Mexico > Gulf of Mexico (0.04)
Asia > China (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

xLSTM: Extended Long Short-Term Memory Maximilian Beck

Neural Information Processing SystemsOct-10-2025, 15:47:44 GMT

Since then, LSTMs have stood the test of time and contributed to numerous deep learning success stories, in particular they constituted the first Large Language Models (LLMs). However, the advent of the Transformer technology with parallelizable self-attention at its core marked the dawn of a new era, outpacing LSTMs at scale.

machine learning, natural language, xlstm, (19 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Austria > Upper Austria > Linz (0.04)
Oceania > Australia (0.04)
(5 more...)

Genre: Research Report > New Finding (0.45)

Industry:

Health & Medicine (1.00)
Media > News (0.45)
Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Enhancing Adaptive History Reserving by Spiking Convolutional Block Attention Module in Recurrent Neural Networks Qi Xu1Y uyuan Gao

Neural Information Processing SystemsOct-9-2025, 05:47:18 GMT

The performance of our model was evaluated in DVS128-Gesture dataset and other time-series datasets.

artificial intelligence, information, machine learning, (19 more...)

Neural Information Processing Systems

Country: Asia > China > Liaoning Province > Dalian (0.04)

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)

Add feedback

TRA: Better Length Generalisation with Threshold Relative Attention

Opper, Mattia, Fernandez, Roland, Smolensky, Paul, Gao, Jianfeng

arXiv.org Artificial IntelligenceOct-7-2025

We test whether these limitations can be explained through two key failures of the self-attention mechanism. The first is the inability to fully remove irrelevant information. The second is tied to position, even if the dot product between a key and query is highly negative (i.e. an irrelevant key) learned positional biases may unintentionally up-weight such information - dangerous when distances become out of distribution. Put together, these two failure cases lead to compounding generalisation difficulties. We test whether they can be mitigated through the combination of a) selective sparsity - completely removing irrelevant keys from the attention softmax and b) contextualised relative distance - distance is only considered as between the query and the keys that matter. We show how refactoring the attention mechanism with these two mitigations in place can substantially improve the generalisation capabilities of decoder only transformers.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2503.23174

Country: